2016-06-16

This archive contains replication files to accompany the paper, “Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-Recidivism Policies in Colombia”.  For questions, contact Cyrus Samii (cdsamii@gmail.com). The archive contains materials to replicate the simulation study as well as the application.    

I. A note on computational resources:  

The simulations and ensemble IPW estimation for the application were done on Intel Xeon E-2690v2 x86_64 3.0GHz (2014) machines on NYU High Performance Cluster (HPC).  I indicate approximate job times and requested memory for those jobs below.  The other Stata and R jobs were done on an iMac under OS X Yosemite with 3.3 GHz Intel Core i5 and 32 GB of memory.

II. Description of files

Below I describe each of the files in the archive.  

- sims1.R, sims5.R, and sims10.R: R scripts for the simulation study.  These took about 20 hours to complete on the HPC, and the memory request was 10 GB.

- sims-results.R: R script to produce the graphs displaying simulation results.

- sim1.csv, sim5.csv, sim10.csv: Results from the simulations as reported in the paper.

- COLOMBIA_STEP8_Regressions_REDO.dta: The starting dataset for the application.

- Hypothetical_Interventions_REDO.xls: Excel spreadsheet showing definitions of hypothetical interventions.

- data-prep-wls-matching-naiveipw.do: Stata do file that creates the intervention variables and computes the alternative estimators that we compare to ensemble IPW in the paper.

- interv-pscore-revised-cluster-imp1-2.R, interv-pscore-revised-cluster-imp3-4.R, interv-pscore-revised-cluster-imp5-6.R, interv-pscore-revised-cluster-imp7-8.R, interv-pscore-revised-cluster-imp9-10.R: R scripts to compute ensemble IPW estimates on the 10 imputation-completed datasets.  These jobs each took about 10 hours to run on the HPC, and the memory request was 10 GB..

- int-out-all.xls:  Excel spreadsheet collecting all the estimates for the application as reported in the paper.

- int-out-pscore-plots.R: R script to create the p-score histograms.

- int-results-balance-tables.R: R script to create the covariate balance figures. 

- int-results-performance-metrics.R: R script to create the graphs showing the weight given to each algorithm in the SuperLearner.

- int-results-graph.R: R script to create the graph displaying the various estimates for the application.


